9 research outputs found

    Inferring epigenetic and transcriptional regulation during blood cell development with a mixture of sparse linear models

    Get PDF
    Motivation: Blood cell development is thought to be controlled by a circuit of transcription factors (TFs) and chromatin modifications that determine the cell fate through activating cell type-specific expression programs. To shed light on the interplay between histone marks and TFs during blood cell development, we model gene expression from regulatory signals by means of combinations of sparse linear regression models. Results: The mixture of sparse linear regression models was able to improve the gene expression prediction in relation to the use of a single linear model. Moreover, it performed an efficient selection of regulatory signals even when analyzing all TFs with known motifs (>600). The method identified interesting roles for histone modifications and a selection of TFs related to blood development and chromatin remodelling. Availability: The method and datasets are available from http://www.cin.ufpe.br/~igcf/SparseMix. Contact: [email protected] Supplementary information:Supplementary data are available at Bioinformatics online

    CpG-depleted promoters harbor tissue-specific transcription factor binding signals—implications for motif overrepresentation analyses

    Get PDF
    Motif overrepresentation analysis of proximal promoters is a common approach to characterize the regulatory properties of co-expressed sets of genes. Here we show that these approaches perform well on mammalian CpG-depleted promoter sets that regulate expression in terminally differentiated tissues such as liver and heart. In contrast, CpG-rich promoters show very little overrepresentation signal, even when associated with genes that display highly constrained spatiotemporal expression. For instance, while ∼50% of heart specific genes possess CpG-rich promoters we find that the frequently observed enrichment of MEF2-binding sites upstream of heart-specific genes is solely due to contributions from CpG-depleted promoters. Similar results are obtained for all sets of tissue-specific genes indicating that CpG-rich and CpG-depleted promoters differ fundamentally in their distribution of regulatory inputs around the transcription start site. In order not to dilute the respective transcription factor binding signals, the two promoter types should thus be treated as separate sets in any motif overrepresentation analysis

    Statistical Modeling of Transcription Factor Binding Affinities Predicts Regulatory Interactions

    Get PDF
    Recent experimental and theoretical efforts have highlighted the fact that binding of transcription factors to DNA can be more accurately described by continuous measures of their binding affinities, rather than a discrete description in terms of binding sites. While the binding affinities can be predicted from a physical model, it is often desirable to know the distribution of binding affinities for specific sequence backgrounds. In this paper, we present a statistical approach to derive the exact distribution for sequence models with fixed GC content. We demonstrate that the affinity distribution of almost all known transcription factors can be effectively parametrized by a class of generalized extreme value distributions. Moreover, this parameterization also describes the affinity distribution for sequence backgrounds with variable GC content, such as human promoter sequences. Our approach is applicable to arbitrary sequences and all transcription factors with known binding preferences that can be described in terms of a motif matrix. The statistical treatment also provides a proper framework to directly compare transcription factors with very different affinity distributions. This is illustrated by our analysis of human promoters with known binding sites, for many of which we could identify the known regulators as those with the highest affinity. The combination of physical model and statistical normalization provides a quantitative measure which ranks transcription factors for a given sequence, and which can be compared directly with large-scale binding data. Its successful application to human promoter sequences serves as an encouraging example of how the method can be applied to other sequences

    Predicting gene expression in T cell differentiation from histone modifications and transcription factor binding affinities by linear mixture models

    No full text
    Abstract Background The differentiation process from stem cells to fully differentiated cell types is controlled by the interplay of chromatin modifications and transcription factor activity. Histone modifications or transcription factors frequently act in a multi-functional manner, with a given DNA motif or histone modification conveying both transcriptional repression and activation depending on its location in the promoter and other regulatory signals surrounding it. Results To account for the possible multi functionality of regulatory signals, we model the observed gene expression patterns by a mixture of linear regression models. We apply the approach to identify the underlying histone modifications and transcription factors guiding gene expression of differentiated CD4+ T cells. The method improves the gene expression prediction in relation to the use of a single linear model, as often used by previous approaches. Moreover, it recovered the known role of the modifications H3K4me3 and H3K27me3 in activating cell specific genes and of some transcription factors related to CD4+ T differentiation.</p
    corecore